SINGLE ANNULUS LP ESTIMATES FOR HILBERT 
TRANSFORMS ALONG VECTOR FIELDS 



MICHAEL BATEMAN 

Abstract. We prove L p , p £ (1, oo) estimates on the Hilbert transform 
along a one variable vector field acting on functions with frequency sup- 
port in an annulus. Estimates when p > 2 were proved by Lacey and Li 
in [4] . This paper also contains key technical ingredients for a companion 
paper [3] with Christoph Thiele in which L p estimates are established 
for the full Hilbert transform. 
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Let v be a nonvanishing vector field that depends one variable, i.e., 
v: R 2 — > M 2 \{0} and v(x\,X2) = v(x\). In this paper we prove LP estimates 
on the Hilbert transform along v precomposed with frequency restriction to 
an almost- annular region. More specifically, define 



Because of the structure of the Hilbert kernel, the magnitude of v is irrele- 
vant, provided it is nonzero. For this reason we may assume that v(x±, X2) = 
(1, u(x±)). We will further assume that the slope of v is bounded by 1. This 
will be helpful for some technical reasons in this paper, but our main interest 
is in the action of H v on arbitrary functions (i.e., those not necessarily hav- 
ing frequency support in an annulus); in this more general case, the operator 
is invariant under dilations in the vertical variable. See [3] for more on the 
symmetries of this problem. This invariance allows us to assume, in that 
case, that the slope of v is bounded by 1. (This is mostly a technical con- 
venience, that allows us to think of rectangles and parallelograms as being 
the same kind of objects.) Since this general problem is the primary moti- 
vation for this paper, we adopt the restriction on the slope here as well. The 
general problem is addressed in a companion paper with Christoph Thiele 
[3]. This paper is logically prior to the other, and is therefore self-contained. 
Fix w > 0, and define r to be the trapezoid with corners (— ^, ^), (^, ^), 



Here we prove the following 

Theorem 1. Let v be a vector field depending on one variable with slope 
bounded by 1. Let p G (l,oo). Then 



1. Introduction 




(-£4). ""I Als ° define 



\\(H v oU T )f\\ p < \\U T f\\ p . 
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We remark that the estimate in this theorem is independent of the param- 
eter w in the definition of r, which comes as no surprise given the dilation 
invariance of the problem. Further, the restriction to a trapezoid specifically 
is nothing to take seriously. Using the assumption on the slope of the vector 
field we can already assume supp / lies in a two-ended cone near the vertical 
axis, because H v acts trivially on functions with support outside this cone. 
More precisely, if / is support in a cone close to the horizontal axis, then we 
have with the constant vector field (1,0): 

Hvf(x,y) = H( lfi) f(x,y) , (1.1) 

because -H"(i,o) is a multiplier corresponding to right and left half-planes. 
But #(i,o) is trivially bounded, justifying our claim. Finally, a trapezoid is 
the restriction of the cone to a horizontal frequency band. We could have 
equally well stated the theorem for functions with support in the full band, 
and reduced it to the trapezoidal case. Alternatively, we could have worked 
with an annular region, or an annular region intersected with a cone. Our 
methods work equally well in these cases. We chose the horizontal band 
(rather than an annulus) because of the special structure of one-variable 
vector fields, but for other vector fields an annular region may be more 
appropriate. 

Perhaps the biggest contribution of this paper (aside from its applicability 
to [3]) is a more streamlined and mechanized collection of two-dimensional 
time- frequency tools. Building heavily on important earlier work of Lacey-Li 
(see [4] and [5] ) , we clarify the relationship between the density-related max- 
imal operators (see Lemma 20) and the more classical time-frequency tools. 
Specifically, a key sublemma in [1], combined with this more efficient under- 
standing, allows us to obtain the full range of exponents p € (l,oo) here. 
Further, although the results are stated only for one- variable vector fields, it 
is clear how to combine a maximal theorem for a different vector field with 
the methods of this paper. We should remark that time-frequency analy- 
sis in two-dimensions is rather less-well-developed than in one-dimension, 
with work of Lacey-Li being the only natural precursor to this paper. We 
therefore strove to make the paper self-contained and to include proofs of a 
number of lemmas that are standard in one-dimension, but whose proofs in 
the two-dimensional situation do not seem to appear in the literature. 

1.1. Related work. Study of such problems is motivated by the obvious 
connection to the problem of estimating the Hilbert transform on functions 
that have not been Fourier-localized. Stein, for example, conjectured that 
if v is Lipschitz, then H v (or rather, a truncated version of it) is a bounded 
operator on L 2 . We note that when v depends on only one variable, the L 2 
boundedness of H v is a rather immediate consequence of Carleson's theo- 
rem, as shown in [5]. Stein's conjecture is the singular integral variant of 
Zygmund's well-known conjecture on the differentiation of Lipschitz vector 
fields. For a fuller history, see [5]. More recently, Thiele and the author 
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proved a range of LP estimates on the full Hilbert transform along a one 
variable vector field, using some key lemmas from the present paper. It is 
known that the operator H v is related to the return-times theorem from 
ergodic theory; see [3] for more on this connection. 

We remark that the operator C is quite similar to Carleson's operator 
(i.e., the maximal Fourier partial sum operator). The argument in [4] is also 
quite similar to the Lacey-Thiele proof of Carleson's theorem (see [6]). The 
argument here draws on ingredients from [4], but obtaining LP estimates 
for p < 2 in this situation requires more effort, partly because the relevant 
maximal operators are more complicated, but also because making use of 
the maximal theory is more complicated. In the 1-D situation, exceptional 
sets are unions of intervals; nothing so simple is the case here. 

Theorem 1 was proved for arbitrary vector fields when p > 2 by Lacey and 
Li in [4]. (In fact, they proved a weak L? result.) The same authors, in [5], 
introduced a method for obtaining U\ p < 2, estimates on H v o H T when a 
certain maximal theorem is available for the vector field v in question. (The 
story is a bit technical: they proved a theorem contingent on the existence 
of this certain maximal theorem in the case of truncated Hilbert kernels. 
However the method had little to do with the truncation of the kernel, 
allowing us to extend it here.) The author proved such an LP maximal 
theorem when v depends on one variable in [1]. Given this result, it is 
not surprising that the method from [5] yields a result for some p < 2, 
but the value of p obtained from the method in [5] seemed far from sharp. 
(At the very least, the method seemed nonsharp. Of course, this was not 
important for the authors there.) It was clear, for example that new ideas 
would be required to even reach p close to |. The author recently improved 
the estimates in this maximal theorem to (essentially) best possible in [2]. 
Because of this, the author decided to investigate the precise range of p for 
which Theorem 1 holds. 

1.2. New ideas. The novelties in this paper that allow us to obtain the full 
range of p claimed in Theorem 1 are a simplification of the approach in [5] , 
and a more efficient appeal to the maximal theorems. 

We elaborate a bit more on these points for readers already familiar with 
the argument in [5]. 

Regarding the first point: In [5], tiles are sorted into trees via standard 
density and orthogonality (size) lemmas. An important additional observa- 
tion made in [5] is that if T is a collection of trees such that for each T G T 
the "size" of T is about a and the "density" of the top of T is about 5, then 
we can control YlreT I*°pCOI by using an appropriate maximal theorem. 
Their argument, however, requires an additional twist to handle trees with 
large size whose tiles have density ~ 5, but whose tops have density much 
less than 5. Here we use an organization of the tiles that admits a more 
straightforward argument. This organization is carried out in Section 8, 
which contains more discussion as well. 
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Regarding the second point: A rather simple observation allows us to 
appeal to a key ingredient in the proof of the maximal theorem, rather than 
the theorem itself. This strengthens estimates on X^TeT |"top(X) | for trees 
as mentioned in the last paragraph. This observation allows us to obtain 
the full range of p. This observation uses the proof of [1], and hence does 
not even take advantage of the sharp L p estimates on the maximal operator 
obtained in [2]. See Lemma 20. 



1.3. Organization of paper. Readers familiar with time-frequency anal- 
ysis, having a bit of faith, and wanting an executive summary should follow 
this outline: Skip to the definition of the model operator in Section 2.4. 
Then (possibly after skimming Section 3 to review essentially standard def- 
initions,) read Sections 4, 5, and 8. Those wanting to check the numerology 
should also read Section 6. A comprehensive outline is below. 

In Section 2, we reduce the theorem to an analogous one for a model 
operator. 

In Section 3, we present some key definitions needed for the organization 
of our set of tiles. (Recall that the operators in question are model sums 
over tiles.) 

In Section 4, we make the main decomposition of the collection of tiles 
and state several key estimates that follow from the decomposition. 

In Section 5, we state the main lemmas needed to prove the estimates 
stated in Section 4. 

In Section 6, we balance these various estimates to prove the main theo- 
rem. There is no serious content here. 

In Section 7, we prove the density lemma, which estimates J2teT |"top(X)| 
for certain collections T by using elementary covering ideas. 

In Section 8, we prove the maximal estimate, which controls YlreT |"top(X) | 
for certain collections T by using more sophisticated techniques in combina- 
tion with L p and -BMO-type estimates on a square function related to the 
"projection" operator associated to trees. 

In Section 9, we compare the size of a tree to its intersection with the 
function in the definition of size. 

In Section 10, we prove the tree lemma, which controls the contribution to 
the model sum from one tree. The proof mirrors that of the (more) classical 
one-dimensional tree lemma, with a small bit of extra work required to 
handle two-dimensional tail terms. 

In Section 11, we prove the size lemma, which estimates XlreT |"top(X) | 
for certain collections T by using orthogonality. 

In Section 12, we prove a refined Bessel inequality that allows us to control 
tail terms in the size and tree lemmas, as well as in the proof of localized 
IP estimates for the square function mentioned above. 

In Section 13, we prove localized (to the top of a tree) L p estimates for 
a square function associated to a tree. Once again, we follow a relatively 
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standard argument and appeal to the refined Bessel inequality to handle 
some two-dimensional technicalities. 

In Section 14, we prove that higher LP norms of the square function are 
controlled by lower LP ones by using standard BMO techniques. 

In the appendix, we recall the proof in [4] of the L p , p > 2 case of our 
main theorem. 

2. Reductions 

In this section we reduce the LP estimates in Theorem 1 to restricted 
weak-type estimates on a model operator. The model operator should look 
familiar to readers familiar with developments in time-frequency analysis 
from the last ten to fifteen years: it is a sum over "tiles" of wave packets. 
The model operator arises from decomposing 

(1) the Hilbert kernel j into (smoothly cutoff) dyadic intervals on the 
frequency side; for technical reasons we make these annuli rather 
thin, resulting in two summation indices for the Hilbert kernel. In 
fact, we actually decompose the projection operator onto positive 
frequencies, and write the Hilbert transform as a linear combination 
of this operator and the identity operator. 

(2) given any integer I > 0, / on r into ~ 2 l pieces; again, the "~" 
here comes from another summation introduced to provide strict 
orthogonality between the various pieces. 

2.1. Discretizing the kernel. In this section we decompose the operator 
H o n r into a sum of model operators. 

We begin by selecting a Schwartz function t/j^ such that ip^ is supported 

on &m] and ec i ual to 1 on & m] • Let W 0) (*) = ^ 0) {2-h). Now 

define ip^ = J2iez ■ By appropriately defining ipjp with similarly sized 

support, and defining ip\ % \t) = ipo\2~H), we can construct a partition of 
unity for M + ; i.e. 

99 

l(0,oo) =X> W - 
i=0 

This gives us the Hilbert kernel as a linear combination of 100 model kernels 
and the identity. More precisely, let 

H^g{x, y) = J i>P (t)g{x -t,y- tu{x))dt. 

Then writing / for the identity operator, 

99 

Cl HoU T f(x,y) + LoU T f(x,y) = c 2 £ £ Hf ] o U T f(x, y). 

lei i=0 
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By the triangle inequality, we have 

99 

\\h o iw|| p < ||/ o n T /|| p + Y, \\H {i) o n T /|| p , 

where H^> = J2i H^ l \ We note that Hi o H T f = for I < log ^ + c because 
of the Fourier support of the kernel of the operator Hi. 

2.2. Discretizing the function. We next focus on discretizing the func- 
tion /. For / > 0, we write T>i to denote the collection of dyadic intervals of 
length 2~ l contained in [—2,2]. Fix a smooth positive function /3: R — > R 
such that (3(x) = 1 for x G [—1, 1] and such that f3(x) = when |x| > 2. 
Also assume that x//? 

is a smooth function. This point will become relevant 
for the definition of (p immediately before Lemma 2. Now fix an integer c 
(whose exact value is unimportant) and for each to £ T>i, define 

(3Ux) = (3(2 l+c (x- CuJl )), 

where wi is the right half of u, and c Wl is the center of u\. Define 

(3i(x) = y, 

Note that 

f3i(x + 2- l ) = pi(x) 
for x e [-2, 2 - 2~ 1 ]. Now define 

1 f 1 

li(x) = -J ft(x + t)dt. 

Because of the local periodicity mentioned above, we have that ji(x) is 
constant for x G [—1,1]; say 7/(x) = 5, where 5 is a constant independent of 
I. Hence 

pl(x)l[-i,i](x) = l[_i,i](ar). 

Define yet another multiplier f3: R — > R with support in [|, |], and f3(x) = 1 
for x G [1,2]. Just as 7; is an average over translates of so each i/W is 
an average of model operators. We define the corresponding multipliers on 
R 2 : 

We know that for each / 
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for (£, rf) G r. Note that for each i, 

||/r«(iW)ll P = ||^(^ (i) on r )(im,*/)|| p 



i 



= \\\ J ^ H t ] ^(-^^ f)dt\\ p 
^ \ /jlE^ 011 -)^*/)!!^ 



so it is enough to consider the discretized projections m^ t . In what follows, 
we will assume, without loss of generality, that t = = % and omit the 
dependence on t and i. 

2.3. Constructing the tiles. For each oj G V with I > 0, let U u be a 
partition of M 2 by parallelograms of width w and length ^ whose long side 
has slope 9, where tan# = c(cu) and where c(cu) is the center of the interval 
u, and whose projection onto the x-axis is a dyadic interval. We remark that 
/ < need not be considered. (See the remark immediately prior to Section 
2.2. Note that the index I plays a slightly different role there.) Briefly, the 
parts of the Hilbert kernel whose frequency support is outside the interval 
[ — tu' ui] — ^ ((i- e -' f° r ' < ™) nave no interaction with our function 
/ whose frequency support is contained in the annulus of radius ^. Finally, 
let U = LLe£>^ - If s G £4,, we will write u s := w. 

An element of U is called a "tile" . The following lemma, stated in essen- 
tially this form in [4], allows us to further discretize our operator into a sum 
over tiles. Let denote an element of containing the origin. Suppose 
(fuj is such that \<fZ\ 2 = rn w . Note that ip u is smooth, by our assumption on 
the function /3 mentioned above. Further, each region 

El):^, r?G[l,2]} 

can be obtained by a linear transformation of the trapezoid with corners 
(— 1, 1), (1, 1), (—2, 2), (2, 2), which ensures that the functions <p u , with to € 
V := L>i>oT>i, satisfy uniform decay conditions. To see this, consider the 
transformations 



B = 

and 

C = 



M 








M 






(o 








ft 


i) 
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A composition of these three takes the trapezoid bounded by (—1, 1), (1, 1), 
(-2, 2), (2, 2) to the trapezoid bounded by (M(e + A), M), (M(-e + A), M), 
(2M(e + A), 2M), (2M(-e + A), 2M), which is precisely the area of support 
for (p w when M, e, and A are chosen appropriately. Define 

<Ps(p) = V\s\<Poj(p-c(s)). 

Note that the functions m w are L 1 normalized, so the functions ip s are I? 
normalized. 

Lemma 2. Using notation above, we have 

f * m^x) = lim —2 / V (/, ip s (p + -))y a (p + x)dp. 

N^oo AN 2 J[-N,N]\^ 

Proof. We compute directly: 

f*m u (x) = / f(z) <Pwip)<Pw(jP + x - z)dpdz 
Jzm 2 J P m 2 

= / f( z )y2 / <Pu(P + z)<Pu(p + x)dpdz 
= ^2 / f( z ) i Pu(p + z)dzip w {p + x)dp 
= j (f, ( Pu(p + -)) ( Pu(p + x)dp 

= nn I (f><Ps(p + -))<p.s(p + x)dp 

sew. lUujl J v eR " 
= J im 7^?2 / (f> ( P*(P + -)) ( Ps(p + x)dp. 

To see the last equality, note that the integrand is periodic in p, and the 
error (which arises from the fact that [— N, N] 2 will not exactly agree with 
the boundaries of the tiles s) goes to zero as N — > oo. □ 

This lemma allows us to conclude (using the dominated convergence the- 
orem) that 

Hi(f*m u )(x) = lim — L / H t I S2{f,(p 8 (p + ■))<p s (j> + x)] dp. 

N^oo AN 2 J hN , N]2 J 

This allows us to restrict attention to the model operator that we define 
shortly. Define 

4>s = 1plog(length(s)) 

and 

<f>s(xi,x 2 ) = J ip s (t)(p s (xi-t,X2-tv(x))dt. 
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We record the following fact for use in the proof of the tree lemma in Section 
10.2.2. 

Lemma 3. We have <f) s (x) = unless v(x) £ oj s ^- 

Proof. Use PlancherePs theorem and the Fourier supports of ip s and ip s . □ 
2.4. The model operator. We can finally define our model operator: 

Cf = 

seu 

For readers following the executive summary: ip s is a standard wave packet 
adapted to the tile s, and 4> s is the appropriate scale of the Hilbert transform 
acting on (p s . A good mental shortcut is to imagine (j) s {x) = (p s (x)l UJs 2 (u(x)), 
an expression quite similar to one appearing in the Lacey-Thiele proof of 
Carleson's theorem. By Lemma 2, each operator is an average of models 
of the form C. Hence it is enough to prove the following theorem. 

Theorem 4. With C defined immediately above, and p G (l,oo), we have 

l|C/||p<||/||p. (2.1) 

By appealing to restricted weak-type interpolation, it suffices to prove 

\(C1 F ,1 E )\< lE^lF^ 

for arbitrary E,F OM 2 and p € (1, oo). Of course by the triangle inequality 
it suffices to prove the following inequality: 

^2\{i-F,<Ps)(lE,<l>s)\ £ (2-2) 

for any p £ (1, oo), any E,F C M 2 , and any finite S C U. This is our task 
for the rest of the paper. Lacey and Li have already proved this estimate for 
arbitrary vector fields when p > 2. We discuss this proof in the appendix. 
Note that for p < 2, we have 

i_ i 

, ,l , , i , , i f\F\ \ p 2 ^ , ,i ,i 

l^l 1 p|F|p = |E|5|F|2 ( i^ij >|E|5|F|2 

whenever \F\ > \E\ because ^ — \ > 0. Hence our estimate is already proved 
when \F\ > \E\, so we restrict attention to the case \F\ < c\E\ for some small 
constant c. 

3. Key definitions 

Definition 5. Given a parallelogram R, we write CR to denote the paral- 
lelogram with the same center as R but dilated by a factor of C. 

Definition 6. Given two parallelograms R\ and R2 in U, we will write 
R\ ^ R2 whenever R\ C CR2 and ujr 2 C 
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Recall that ojr is defined in Section 2.3. The exact value of C in the last 
definitions is not important: 10 is enough. We need that if R\ D i?2 7^ and 
URi Q wb 2 , then R 2 < R\. 

Definition 7. A tree is a collection T of parallelograms with a top parallel- 
ogram, denoted top(T), with top(T) £ U, such that for all s £ T, we have 
s < top(T). A tree T is a j-tree if co top t T \ n u s j = 0. Given a tree T, we 
will write Tj to denote the maximal j-tree contained in T. 

Recall that u; S) i is the right half of oj s and lo S: 2 is the left half. The 
following definitions will help us organize our collections of tiles. Recall 
that our vector field v is defined on a set E; this set plays a role in the 
definitions of dense and dense below. Similarly, the definitions of size 
depends on our other set F. 

For x £ M. 2 , let x( x ) = i+\x\wo • For an y parallelogram s, let Xs^ be an 
LP normalized version of x adapted to the parallelogram s. 

Definition 8. Define the following for a parallelogram s and a collection of 
parallelograms S: 

E s = {(x,y) £ E: u(x) £ u s ] 
dense(s) = / x { p 

JEs 

dense(s) = sup dense(s') 

s'>s,s'eu 

size(5) = sup f — — — ^2\(l F ,¥s}\ 2 ) . 

1-treesTCS \|top(T)| ^ J 

We remark that the function x is needed for density since the wave packets 
(p s have Schwartz tails. See the proofs of the tree and density lemmas. The 
extra technicality involved in defining dense (as opposed to just dense) is 
needed for our proof of the tree lemma (just as it is in the one-dimensional 
theory of [6]). The cost is rather high: a density estimate (see Estimate 12 
below) is still easily obtainable, but the maximal estimate becomes much 
more difficult to prove. If dense(s) were equal to dense(s) for every tile s, 
then the tops of the trees constructed in Section 4 are already prepared for 
an application of maximal technology. Unfortunately this is not the case, 
and this difficulty prompts our consideration of the collections IZj in Section 
8. See also the delicate sorting algorithm in Lacey-Li [5], where the authors 
wrestle with the same issue. 

4. Organization 

In this section we carry out the main decomposition of the collection of 
tiles. We sort a given collection of tiles into subsets of tiles of approximately 
constant density, and further into trees of approximately constant size. The 
relevance of trees is shown in the following: 
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Lemma 9 (Tree lemma). Let T be a tree. Suppose dense(T) < 5. Suppose 
size(T) < a. Then 

El<lF,<p s )(W s )| <<5<r|top(T)|. 

This is the "Tree Lemma" from [4] , which is the 2-D version of the same in 
[6]. We prove it in Section 10. It reduces (2.2) to proving for each < e < 1 

EE E Hto P (T)| < |FMi5| e . 
S o TeTg^ 

We can already prove this with the Estimates 11, 12, 13 (appearing in the 
next lemma) and some bookkeeping - this is carried out in Section 6. 

Lemma 10 (Organizational Lemma). Let S be a finite collection of tiles. 
Then there exist a partition of S into trees Ts >a where 5, a are dyadic with 
S < 1, (i.e., S = \Js a^TeTg ^) such that the following estimates hold: 

Estimate 11. [Orthogonality] 

E |to P (T)i < M. 



Estimate 12. [Density] 



E i to p( T )i s 



Estimate 13. [Maximal] For any e > 0, 

E i tQ p( T )i s 



F\ l - e \E\ 



Remark 14. In fact we can take u < 1, which we need (and prove) in the 
appendix. 

In the remainder of this section we construct the collections of trees Ts, a - 
In the following sections we prove the estimates above. Estimate 11 follows 
from the construction of the trees Ts )(T1 and the proof of the standard size 
lemma; we give a proof in Section 11. We prove Estimates 12 and 13 in 
Section 8. We remark that we make these claims about the same family 
of trees. This is in contrast to [6], [4], [5], in which the argument has the 
form "There exists a family T s ize such that S$ = UTeTsize^ an d such that 
the size estimate holds for the collection T s i ze ', further there is a (potentially 
different!) family Tdensity such that S$ = LlTeT densit yT and such that the 
density estimate holds for the collection Tdensity" 

First, we sort the tiles by density: Let 



Ss = {s £ S: dense(s) G (-<5, <5]} 
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for dyadic 6. By the definition of dense, we need only consider 5 < \ \x\\i ^5 
1. 

We next sort each collection S$ into families of trees with comparable 
size. The following algorithm is a slight variant of the sorting algorithm 
used in [6] and in [4]. We want to ensure that top(T) G T for each tree T 
in our construction. There are some small technicalities that arise in the 
2-D situation due to the non-transitivity of the relation "<". Without loss 
of generality, we may assume our collection of tiles S is finite, so we know 
there exists a max such that size [S) < a max for every T C Sg. This gives us 
a starting point for the following lemma. 

Lemma 15. Let S be a collection of tiles satisfying size(5) < a. Then 
there exists a disjoint collection of trees T a such that for allT G T c , we have 
top(T) G T, and 

size (s \ |J t) <°-. 

Finally, we have the estimate 

£ |top(T)| < M, (4.1) 

TeTa 

where here F is the set used in the definition of size. 

Remark 16. Having top(T) G T will be helpful in Section 8. See in par- 
ticular the construction of the rectangles Rt and the collections Tr. 

Proof. Initialize 

STOCK = S 
% = 0- 

In the following scheme we write C to denote the constant used in the 
definition of tree (see Definition 7), which we assume is somewhat large. 
While there is a 1-tree T C STOCK with 

and with top(T) G T, choose T with c(w t0 p(T)) most clockwise, let T be the 
maximal tree with top equal to top(T), and update 

STOCK := STOCK \f 

% := %U{f}. 

(Again, we write c(w t0 p(T)) to denote the center of Wtop(T)-) 

Remark 17. We remark that our choice of c(u} top ^x)) most clockwise will 
be used in the proof of Estimate 4-1 in Section 11. See specifically Claim 34- 
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When no such trees remain, we have the collection of trees T a described 
in the statement of the lemma. By construction we see that top(T) G T 
and that size(T) > ^ for each T G T a - The estimate (4.1) follows rather 
standard arguments; we present the proof in Section 11. It remains to prove 
the following: 

Claim 18. 

size {STOCK) < °-. 

Consider a tree T C STOCK. Without loss of generality, T is a 1-tree 
(since the definition of size only takes into consideration 1-tree subtrees of 
T anyway). We will partition T into a collection Tt of subtrees of T, each 
of which contains its top, as follows: Initialize 

PANTRY := T 

T m ax • — • 

While PANTRY is nonempty, choose a tile t of maximal length in PANTRY ', 
let T t be the maximal subset of PANTRY such that s < t for s G T t , and 
update 

PANTRY := PANTRY \ T t 

Tmax := T max U \t\ . 

It is clear that this construction exhausts all of T; i.e., eventually PANTRY 
becomes empty. Since the tiles t G T max all satisfy w t o P (T) ^ 00 1, and 
since each is maximal with respect to "<", we know these tiles are pairwise 
disjoint. On the other hand, they are all contained in Ctop(T), and t = 
top(T t ), so 

£ |top(T t )|<qtop(T)|. 
Further, since each tree T t for t G T rnax contains its top, we know 



for otherwise T t would have been selected and put into T a . Hence 



StT teTmax S&Tt 

J2 



< 

This implies 



< e i to p( T *)i^2 

c7 2 |top(r)| 



c 



size(T) < 
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which proves the claim provided C > 4. □ 

By applying the lemma iteratively to each collection S$, we obtain collec- 
tions Ss, a and 7^0- such that 

S S ,a= (J T 
T€T 6;a 



where the union is disjoint, such that dense(s) ~ 5 for s G 5«5 i(7 , and such 
that 



for T G 75 iC r- This proves Lemma 10, except for Estimates 12 and 13. Note 
that Estimate 11 follows from (4.1). 

5. Main Lemmas 

Here we present the main lemmas needed to prove Estimates 12 and 13. 

Lemma 19. Suppose 1Z is a collection of pair-wise incomparable (under "<") 
parallelograms of uniform width such that dense(i?) > 5 for R G 1Z. Then 

Ren 

Lemma 19 is nothing more than the Density Lemma from [6] with straight- 
forward modifications for the 2-D setting. 

Lemma 20. Suppose 1Z is a collection of pairwise incomparable (under "<") 
parallelograms of uniform width such that for each R £ TZ, we have 

{EDu^iuj^nRl 



\R\ 



> 5 (5.1) 



and 



Then for each e > 0, 



w\L 1f ^ x - (52) 



£l*l< |F| 



5X^ ' 

Ren 

The proof of Lemma 20 is contained in Section 3 of [1]. More specifically, 
see estimate (3.10) on page 959, as well as the construction of the collection 
of parallelograms called 1Z\ there. Note that this last lemma requires an 
assumption of the form 



w/« lF>A: 
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on the other hand, our assumption on T 6 Ts, a is that size(T) < a and 



where T\ is the maximal 1-tree in T. The following lemma shows that the 
second kind of fact implies the first without much loss: 

Lemma 21. Let FC| 2 . Suppose T is a tree with size(T) < a and 



Lemma 21 is proved in Section 9; it follows from L v and BMO-type 
estimates on a square function related to the notion of size. 

Estimate 13 deserves more prominent mention. An estimate in this spirit 
was proved in [5]. However here we have much better dependence on the 
parameter 5 due to a rather simple observation. The argument in [5] follows 
essentially the argument of the density lemma, with an appeal to a maximal 
theorem to control [{M^li? > A}|. In our case of a vector field depending on 
only one variable, the relevant maximal operator was studied by the author 
in [1], [2]. However this approach is inefficient. Instead of combining the 
density argument with a maximal function estimate (each of which costs in 
terms of ^), we appeal to an argument made in [1], which directly estimates 



for any e > 0. In fact, this estimate was established en route to a covering 
lemma which implies the maximal theorem. Interestingly, the improved L? 
estimates established in [2], which interpolate to give improved LP estimates, 
are unhelpful in this setting, precisely because they are estimates on the 
operator norm, rather than on a sum like the one appearing immediately 
above. 



In this section we carry out some computations which allow us to prove 
(2.2), and hence the main theorem. We now estimate 







6. Balancing the estimates 



EE E Hto P (T)|. 



<5 <* TeT s , a 

We have two cases. Recall that E and F are sets with |F| < \E 
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6.1. Case 1: <5 > A quick computation shows that (up to additive 0(e) 
terms in the exponents) 



• the maximal estimate is more efficient when a > 

• the density lemma is more efficient when a < ¥gt. 



\I1 

\E\ 



Remark 22. The maximal estimate is more effective than the size estimate 
for 5 > jgj and a close to j^j . Without this, we would not be able to obtain 
L p estimates for any p < 2. 

For the first range, with 5 fixed, we have for any e > 

E E Mtop(r)| < y. 

a> \JlTeTs,v a> lH 
- 1^1 



m 1 -^ y — 



rr> |F| 



Summing this over dyadic 1 > 6 > j^j gives us a total of < |F| 1-3e |.E| 3e . 
For the second range, with 5 fixed, we have 



£ £ Htop(T)| < Yl S ° lL 



s 



E 



Once again, summing this over dyadic 1 > S > j^j gives us a total of 

6.2. Case 2: 5 < j-gj. In this case, the size and density estimates alone will 
be enough for us. A quick computation shows that 

• The size estimate is most efficient when a > yj 5jg- 

• The density estimate is most efficient when a < yj . 

We decompose our sum over a into these two ranges. For the first range, 
we have 

E *5 - w E \ 

I . I T?\ 



18 MICHAEL BATEMAN 

\F\ i i i 1 1 i i 

Summing over S < jgj gives us a total of < \F\ < |F| since \F\ < \E\. 

For the second range, we have 

~ \f\FjE\5. 

Once again, summing over 5 < gives us a total of < \F\ < iFj 1-6 !^^, 
since |F| < \E\. 

This completes the proof of the main estimate (2.2) modulo the proofs of 
the lemmas, which are given in the following sections. 

7. Density lemma 

In this section we prove Lemma 19. Let 1Z be as in the hypotheses of the 
lemma. For k = 0, 1, 2, . . . , let TZ k be the collection of R G 1Z such that 

\ U - 1 (u R )n2 k RnE\ > -l-52 20k \2 k R\, 

and such that k is the least integer with this property. Note 1Z = UkTZk, 
since if R G 1Z but R g" U^T^, then 



dense(i?) < I X r 

Je r 



oo 



k=0 

oo 



< -i ^_ y^2 25A: | J R|2- 100fe 

- mn I Rl ^ 11 



< 



100 \R\ 

1 1 k=0 

6_ 
50' 

We now run an iterative selection procedure to find a subset of 1Z k such 
that the parallelograms 2 k R are disjoint: 
Initialize 

STOCK = TZ k 
TZk = 0. 

While STOCK / 0, choose R with maximal length, let 

A R = {R' e STOCK: 2 k R' C\2 k R^$ and u R Hlur^ 0}, 
and update 

STOCK: = 1Z k \A R 
TZ k = TZ k U{R}. 
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Note that the parallelograms in Ar are pairwise disjoint by the pairwise 
incomparability of parallelograms in 7Z, and because ujri n ujr / for R' G 
Ar. Hence, using the definition of TZk, we have 

E \r\ = EE i*i 

< z 2k E 

< 2 2fc 2 -20fci y lu- 1 ^) n2 fe i?n£| 

ReK k 

where in the last inequality we have used the fact that the parallelograms 
2 k R are pairwise incomparable, and that ujr = oj 2 kR, so that the sets 
{n _1 (a;^) fl 2 h R} are disjoint. Finally, we sum over k to obtain the result. 

8. Proofs of maximal and density estimates 

We now look more closely at the collections Ts t<T - F° r the remainder of this 
section we regard 5 and a as fixed. Notation in this section is understood to 
depend on both 5 and a. (So, for example, T = Ts t<T .) We begin by isolating 
a collection of tiles with density 5. First, let 

■R = {R€U: dense (R) ~ 5}. 

We now find a maximal subset of 1Z whose elements are pairwise incompa- 
rable. Initialize: 

STOCK = K 
K = 0. 

While STOCK / 0, choose R of maximal length in STOCK. Define 
A R = {R' G STOCK: R' < R}, 

and update 

STOCK = STOCK \A R 
K = TZ U {R}- 

When the loop terminates, elements of 1Z are pairwise incomparable (under 
<), and 1Z is maximal with respect to this property. 

Remark 23. Recall that for T G T , dense(top(T)) ~ S, but maybe dense(top(T)) 

is much less than 5. This makes the maximal Lemma 20 unavailable to us. 
Note that several ingredients are required, and top(T) may lack the dense 
required. The work in this section goes to organizing the trees in such a way 
that we can legitimately appeal to Lemma 20. 
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Next we associate to each tree T G T a parallelogram Rt G TZ. This 
requires a few steps. Note that for each s G UrerT, we have dense (s) ~ 
(5. By Lemma 15, we know that top(T) G T for each T G T. Hence 
dense(top(T)) ~ 5. This means there exists a parallelogram R £ TZ such 
that dense(i?) ~ (5 and such that top(T) < R. (This is the reason why it 
is convenient to have top(T) G t.) Further, for each R G TZ, there is R G 7£ 
(again, possibly not unique) such that R < R. Hence we may assign to each 
T G T some R £ TZ, and there is .R such that top(T) < R < R. (Of course 
there may be more than one R to choose from for each T; choose one!) Call 
this parallelogram Rt- Now for each R G TZ, define 

Tr = {T G T: R T = R}. 

By construction, 

T = Ur^tzTr. 

Our goal now is to control 

E E i to p( T )i- 

First, we'll show that for all R G 1Z, 

E |top(T)| < \R\. 

TeTn. 

The collection {top(T): T G Tr} need not be pairwise disjoint, but we do 
have the following satisfactory substitute. 

Claim 24. There exists Tr C Tr such that {top(T) : T G Tr} is pairwise 
disjoint and such that 

E |top(T)| < E |t°p(T)|. 

TeT„ Tefn 

Proof. Initialize 

STOCK = Tr 
% = 0- 

While STOCK / 0, choose T G STOCK such that top(T) is of maximal 
length. Then define 

A T = {T' G STOCK: top(T') n top(T) / 0}, 

and update 

STOCK := STOCK \ A T 
Tr := TrU{T}. 
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We stop when STOCK is empty. By construction, the tops of the trees in 
Tr are pairwise disjoint. Now we show that 

£ |top(T')|<C'|top(T)|. 

T'eAr 

With this we'll know that 

J2 itopmi = E E i tQ p( T ')i < c> E i to p( T )i- 

reTk tgTrT'gAt ran 

Suppose not. Define S = l)T>eA T T{, where for a tree T, define T\ to be 
the maximal 1-tree contained in T. We claim S can be partitioned into a 
small number of trees Sj, j = 1, . . . , IOC 2 , with each a 1-tree. To see that 
they are 1-trees, suppose s G T" 6 .At- Then o; Si 2 5 ^top(T') 5 <^top(T)> 
so w S) i n w t0 p(T) = 0- To see that we only need a few trees, just note that 
for each T G .At, top(T') C C(top(T)). Then since each s G T' satisfies 
s C C(top(T')), we know that S can be partitioned into ~ C 2 subtrees 
Sj by considering (possibly overlapping) tiles in C 2 top(T) of height w and 
length the same as length of top(T). Hence 

IOC 2 

EEk/>^>i 2 ^ E Ek/'^>i 2 

T'eA T s£T[ 

> \ Y, ^|top(T')| 

T'&At 

> a 2 ^|top(T)| 



Provided C is taken large enough (with respect to a universal constant C 
mentioned in Section 3), one of the trees Sj satisfies size(Sj) > 10a, which 
is impossible since the trees T G Tr were chosen from a collection with size 
less than a. This proves the second claim about Tr. □ 

8.1. Proof of the density estimate. We are already in position to prove 
Estimate 12. Note that the collection 1Z constructed above is of pairwise 
incomparable parallelograms of uniform width and dense ~ S. Hence the 
previous claim, together with Lemma 19, implies 

E E i tQ p( T )i £ E i*i 

ReTiTeTn Ren 

< m 

~ 5 ' 

8.2. Proof of the maximal estimate. The proof of Estimate 13 is a bit 
more involved. For the rest of this section, fix e > 0. The first key step is to 
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sort the parallelograms in TZ by how heavily they are covered by the trees 
in Tr. Specifically: for integers j > 0, define 

Hi = {R€ll: E |top(T)|~2"''|fl|}. 

TeT R 

Since our goal is to control 

E E ito P (T)i~E E E ito P (T)i~E 2 " J E 

it is enough to estimate Y^ReTlj 1-^1 with suitable dependence on j. 

In order to apply maximal technology (in the form of Lemma 20), we 
must find parallelograms R that heavily intersect F, and that also contain a 
large subset on which v points in the direction of R. Because of the Schwartz 
tails in the definition of dense, we do not know that each R £ TZj satisfies 

\ u -\uj R )nEnR\ >6\R\. 

Rather, we know that 

\ U - 1 (oj R )nEn2 k R\ > 2 20k S\R\ (8.1) 

for some integer k > 0, as in Section 7. Define TZj k to be the set of R € TZj 
such that condition (8.1) holds for R but such that it does not hold with 
any smaller k. Similarly, we cannot conclude that R itself intersects F 
heavily. Recall that Lemma 21 guarantees that F intersects a~ e top(T) 
heavily, whenever T G 7sa', we cannot however, conclude that F intersects 
top(T) itself. This causes some minor differences in the treatment of the 
cases 2 k > a~~ e and 2 fc < a~ e that the reader should not take too seriously. 
It suffices then to control sums like 

E i*i 

with suitable dependence on k and j. 

8.2.1. Casel: 2 k > a~ e . We want to apply Lemma 20 to the collection IZj k- 
The defining condition of TZj^ gives us the kind of information needed by 
the hypothesis (5.1). The following claim gives us the kind of information 
needed by the hypothesis (5.2). 

Claim 25. For R G K jjk 

\2 k R\ ~ ° \2 k 

We postpone the proof of the claim until the end of this section. With the 
claim, the only ingredient still needed to apply Lemma 20 is the pairwise 
incomparability of the parallelograms in question. We arrange this with the 
usual type of sorting algorithm. Initialize 

STOCK = TZj, k 
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Kj,k = 0- 

While STOCK / 0, choose R with maximal length, let 

A R = {R' € STOCK: 2 k R' C\2 k R^$ and oj r Hlur^ 0}, 
and update 

STOCK: = K j:k \A R 
TZj,k = T^-j,k U {R}- 

(Note ljr = ujcr f° r an Y C.) Since the parallelograms R' 6 are pairwise 
incomparable, we know they are in fact disjoint (see earlier in Section 8 for 
a similar argument), so 

\R'\ < |2 fc i?|. 

R'eA R 

Hence 

EE E E i to p(T)i < EE E 2 ~ J \ R \ 

j k Ren jtk TeT R j k R&n jtk 

£ EE E E ^1*1 
* EE E 2-i2^|. 

We now focus our attention on 

2-'\2 k R\. 

Claim 25 together with the defining condition for parallelograms in TZj^ 
allows us to apply Lemma 20, with U S" in (5.1) being 2 20k 5 and "A" in (5.2) 
being 2~ j 2~ 2k a 1+ °^ , as in Claim 25. The huge gain in k from (8.1) allows 
us to sum the contributions from the various TZj,k- More specifically, Lemma 
20 yields 

\F\ 



E l 2 ^' ~ o25* 



2 20k 5 (a 1+e 2- 2k 2^) 1+e 
This obviously sums in k to prove 

E E i to p(T)i < E 2-'-|*i< ^ 7^7; 

-Rg^ tgTr Reft., v ; 

this estimate is effective for small j. Estimate 12 tells us that for any j, 

E E l to P(T)|< E 2-'"|fl|<2-'-l|i; 
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this estimate is effective for large j. It remains to balance these two esti- 
mates: 

EEEi*°p( r )i= £ £ 2 ^i+ £ £ 2 ^i 

~ E 2 %JiJ,) 2 + E 2 ^V 



< IFI 1 "^! 



~ 5<7(i+«) a 
|F| 1_5e |.E| 5e 



Remark 26. 0/ course the first sum above is empty when a < p4; in this 
case we recover the density estimate. Recalling Section 6, we see that in this 
range of a we have no need for the maximal estimate anyway. 

This completes the proof of the maximal estimate, except for the proof 
of Claim 25, which we turn to now. 

Proof of Claim 25. For each T G Tr, Lemma 21 tells us that 

\a-^op(T)nF\ > ^ 



|a- e top(T)| 

One minor technical problem is that the parallelograms cr~ e top(T) might 
not be disjoint. But since all parallelograms {top(T) : T G Tr} have (essen- 
tially) the same orientation, we may use a standard covering argument to 
select a subset Tr of Tr such that 

{a-Hop(T)} TerR 
is pairwise disjoint, and such that 

| LU-top(T)| > | LU-top(T)|. 

TeT R t<eTr 

Hence 

\FnCo-- e R\ > I \J a-Hop(T) n F\ 

TeT R 

= E l cr_e 'top(T) n F\ by disjointness 

Tef R 

> a 1+e E k~ et °P( T )l b y Lemma 21 

TeTR. 

> <x 1+e | (J^a- e top(T)| 

T(zTr 
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> |^a- e top(T)| 

> L[top(T)| 

> cr 1+e E |top(T) | by disjointness 

> a 1+e E l to P( T )l W Claim 24 

> a 1+£ 2~ j \R\ by definition of ^. 

This finishes the proof of Claim 25. □ 

8.2.2. Case 2: 2 k < a~ e . This section is very similar to the previous section. 
As in the last section, we verify the hypotheses of Lemma 20 for a suitable 
collection. 

We consider all of these collections IZj^ together. Let 

'Rj, small = l^J R-j,k- 
0<fc<logo- e 

Now we sort the tiles as before: Initialize 

STOCK = Tlj iS rnall 
R-j,small = 0- 

While STOCK / 0, choose R with maximal length, let 

A R = {R' G STOCK: a^R 1 n a^R / and u a nw K / 0}, 
and update 

STOCK: = K small \A R 

^■j, small T^j, small U {^}- 

As before, we have 

E \r\ < E E \ R '\ * E \°- £R \- 

Ren j;Small Ren jzSmall R ' eA R Ren j;Small 

We again note several properties of the parallelograms in 1Zj t smaii- First, 
they are pairwise incomparable. Second, they satisfy the estimate 

\^R\ ~° 5 - 

This gives us the density estimate 

E^ \°- £ R\ S ^ (8-2) 

-^^^ j , small 
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from a direct application of Lemma 19. Third, just as in Claim 25, they 
satisfy the estimate 

k-RnF| >2 _ v+e 



\a- e R\ 

So by Lemma 20, we have 

y w~ £ r\ % — |F| 1 , . (8.3) 

R^Rsj j small 

As before, we split the sum into large and small j and use (8.2) and (8.3), 
respectively: 

E E X> P cr)i = E E 2-''i*i 

+ E E 2 "^l 

F\ 



y 



. \E\a 



+ £ 2 



< 



a 2 ^ 

^?|l-5e|_g;|5e 



J>logy| 



which is what we needed, since e is arbitrary. 



9. Large size implies large intersection with F 

Remark 27. The title of the section is technically a bit misleading, since 
size(T) is actually the supremum over all subtrees ofT of an I 2 -type norm; 
nevertheless, the trees obtained through the selection procedure in Section 4 
all satisfy the property that the full tree (essentially) achieves this supremum. 

To prove Lemma 21, we need the following notation. For a fixed 1-tree 
T, define the operator 

A(/)=(EK/>^>I 2 ^ 

We need the following facts about A. 
Lemma 28. For any N > 0, we have 

l|A/|| p <||/Av,r||p 




SINGLE ANNULUS L p ESTIMATES 



27 



for p £ (l,oo) ; where 

1 



Pn(xi,x 2 ) 



l + lxil^ + lxs 1 ^' 



and [3n,t is an L°° -normalized version of adapted to top(T). The im- 
plicit constant depends on N but not on T. 

We prove Lemma 28 in Section 13. Of course proving ||A/||2 < H/lb is 
straightforward; indeed, it is an easy special case of Lemma 36. The work 
is in inserting the smooth cutoff /3n, which is the point of Lemma 36, and 
moving below I? . Second, 



Lemma 29. 



I|A/|| 2 <- ]—rf Af, 

ton T) 2 Jctovm 



|top(T)|2 Jctop(T) 
provided that T satisfies the following uniform size estimate: 

SU P | i,,.,,,.,, ) . {.I- r,} r ! i . £ |</, 



l-trees T'CT 




|top(T)| 



The condition in the last lemma is the one mentioned in the remark at 
the beginning of this section. We prove Lemma 29 in Section 14. The point 
of these lemmas is that ||A/||2 is closely related to size(T). Indeed, 

l|A/||^ = ^|(/,^)l 2 . 

On the other hand, we want information about |Fntop(T)| (or possibly 
\F n Mtop(T)| for a dilate Mtop(T) of top(T), which is actually what we 
will obtain below), which is much more closely related to ||A/|| p for p close 
to 1, as we see below. Combining these two lemmas and Holder's inequality 
gives us 



Jtop(T)|^ J |top(T)|2 



< 



< 



A/ 

|tO P (T)| 7ctop(T) 



(A/) 1+, • 



|top(T)| 



* (.^pW/ (/fe)1+£V+f 
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i 



Applying this with f = \p and a tree T such that 
a gives us for any N, 



(]idm\T,se T \(f^s)\ 2 y 



a 1+£ |top(T)| < J lHfe) 1+e 

< \a~ e top(T) D F\ +& 



( N - 2 >\a- £ top(T)\ 



This proves Lemma 21 since TV" can be chosen arbitrarily large with respect 
to e. 



In this section we present a proof of Lemma 9. Recall that we have a fixed 
tree T in mind. For notational convenience we assume that the slope of the 
long side of top(T) is zero. We write 7Ti (E) , tt2 (E) to denote the vertical, 
horizontal (respectively) projections of a set E. Of course the width of every 
tile in T is a fixed number w. Let J\ be a partition of R (the horizontal 
axis) into dyadic intervals such that 3 J x R does not contain any tile s £ T, 
and such that J is maximal with respect to this property. Now let J2 be a 
partition of R (the vertical axis) into intervals of width i|7T2(top(T))|. Let 



This is a partition of R 2 . The parallelograms P € V are the smallest relevant 
parallelograms for this tree. The parallelograms P G V with tti(P) far away 
from top(T) are defined so as to still be able to take advantage of the density 
estimate for tiles in T. Now for each P £ V we split the operator L into 
two pieces, one corresponding to tiles with larger x-projection than P, the 
other to tiles with smaller x-projection than P: let 



10. Proof of Tree Lemma 



V = U U Ji x J 2- 



JieJi J2GJ2 




ser+ 



{ S eT:M S )|>MP)|} 
{s £ T: |7ri(s)| < |tti(P)|} 




Note that for appropriate e s with |e s | = 1, we have 



^2\{f,<p s )(<f> s i E )\ = 2>(/ >¥ > a >(&i E > 
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E / l p + E / L+ P - ( 1(U ) 



Pev r Per 

The main term will come from parallelograms PgP close to top(T); esti- 
mates on parallelograms P away from top(T) will come with a decay factor. 
To make things more precise, define for k > 1, 

?, rp g -p- dist(7r 2 (P),7r 2 (top(r))) 

P ° " k2(top(T))| " 1} 

n - fPcP- dist^^^^top^))) fc _, k 

' k ~ { £ ' ■ k 2 (top(T))| £[Z ' 2 Jlj - 

We focus first on the first term in (10.1). To control it we need only spatial 
decay in both the horizontal and vertical directions. 



10.1. Small tiles. For notational convenience, we further consider for / > 1, 

v tpr-P ■ dkt(7ri(P),7ri(top(r))) 
Vk '° = { eVk - ki(top(T))| 

^ - M^p™ G( ' ]} - 

We divide the sum in the definition of Lp into pieces according to how large 
the tiles are. Specifically, let 

Tj = {s£ Tp : \s\ = 2- J '|top(T)|}. 

The reason for this is that since the tiles s € Tp are shorter than P, their 
frequency intervals can be much larger than that of P, meaning we lose 
control on \P D supp(Lp)|. We use the extra decay from Schwartz tails 
to compensate for this. The upper bound of size(T) < a implies that for 
individual tiles s € T we have \(f,ip s }\ < <r|s| 2 . Hence 



(00) 

s 



m~ N 



< rt~ Nk E 

But note that since dense(s) < 5, we have 
5 > I y« 

|p n supp(X; s6T , (/> ^ s )0 s 1b) 



> 2- 10 °( fc +J , '+0. 
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This last estimate follows from considering the distance between s and P 
relative to the length of s. Hence for any P 6 Vk,i, we have 



/ \l p \ < W \j2(f^s)4> s i E ) 

JP „--->n J P r.f-T. 



j>0 

< aJ2 2-^2=^2=^ |P nsupp(X;</,V. 

j>0 s£Tj 

< 5\P\a2- 10 ^ 
Summing over k, I and P gives us 

E / >a s EE E 

Pev jy />o fe>o PePfe,; - 

< yy y as\p\2- wk 2- m 



\L P \ 



< 



i>o fe>0PePfc,( 
<7«5|top(T)|, 



with the primary contribution coming from P near top(T) as usual. 

10.2. Large tiles. We start by remarking that sorting with respect to hori- 
zontal distance from T (i.e., using the index I, as in the previous subsection) 
is unnecessary in this subsection. For if P € Vki with / > C, then Tp is 
empty, because |IIi(P)| > |I±i(top(T))|. This fact will appear several times 
in what follows. Next, we show that the term under consideration in this 
section has small support. Precisely: 

Claim 30. For P £ V k , Lpl E is supported on a set of size < <5|P|2 100fc . 

The factor 2 100k arises from the tail in the definition of dense and the 
fact that P is away from top(T). Fortunately, the decay in the functions 
(f s for s e T is even greater when P is away from top(T). 

Proof. It is convenient to proceed by contradiction. Assume -Lplg has much 
larger support than <5|P|2 100fc . By the construction of P, we know that there 
is some s£T such that s C C2 k P. But this implies there is R of the same 
dimensions as P, but located spatially over T, with ojr C oj s and such that 
dense(i?) > 1005, say. Since this implies s < R, we have contradicted the 
assumption that dense (s) < 5. □ 

We now turn our attention to the second term in (10.1). Recall the 
definitions of 1-trees and 2-trees. Clearly for every s £ T, either u St i n 
w top(T) = or oj Sj 2 H Wtop(T) = 0> so our tree T can be partitioned as 
T = T\ U T2, where Tj is a j-tree. Let 

{T+) j = T+HT j 

for j = 1,2. Of course (Tp)j is still a j-tree. We treat the two cases 
separately. 



SINGLE ANNULUS L p ESTIMATES 



31 



10.2.1. The 2-tree case. This case is a bit easier to handle because of the 
location of the support of the function <j> s . More to the point: Since Ti 
is a 2-tree, if there exists x such that (j) s {x)(pt{x) 7^ for s,t € T2, then 
\s\ = \t\. This follows from the fact that 4> s {x) = unless v(x) G w S) 2, 
together with the fact that u; S) i 5 ^top(T)) an d similarly for i. (This was 
mentioned near the definition of <p s in Section 2.) Further, we know that 
for any tile s £ T, we have \{f,(p s )\ < cr| s| 2 by the size estimate for T. 
Combining these observations with Claim 30 and the rapid decay of 4> s in 
the vertical direction gives us for P G Vk that 

f (f^s)^lE<<r52' wk \P\, 

P se(T+) 2 

since the integrand is uniformly bounded by c"2~ 200fc . As mentioned earlier, 
if Ms) I > MP) I, then m(P) C C7ri(top(T)). Hence 

EE / E (f,Vs)<f>slE<6a\top(T)\. 
This completes the estimate for T 2 . 

10.2.2. The 1-tree case. In this case we appeal to orthogonality in the form 
of the Bessel inequality in Lemma 36. For parallelograms P G V whose 
vertical component is large, we need the decay factor from Lemma 36. We 
first introduce some extra functions associated to the tiles: let 



a s {x) = J i/) s (t)<p s (xi-t,X2)dt. 



The difference between a s and (j) s is that the vector field v makes no explicit 
appearance in the definition of a s ; rather, the integral is taken over a hori- 
zontal line for every x. In (f> s , however, the integral is taken over an almost 
horizontal line, where the precise definition of almost depends on the length 
of s. (The line is horizontal because we assumed that the slope of the long 
side of top(T) is zero. In the general case it is parallel to top(T).) We have 
the obvious equality 



/ E e sU^s)4>s^-E = / E e s(f^s)a s lE 

e(T+h P se(T+) 1 

+ / E ts{f,¥s){4>s-a s )lE- 

Jp /zl 



This decomposition allows us to reduce our problem to proving the following 
two claims: 

Claim 31. For each P £ V, 



P se(T+) 1 i>0 1 ->•""- sGTx 
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Claim 32. For P G V k , 

E e s (f^ s )( ( p s - as )l E < 2- 200k a. 

»e(T+)i 

Notice that suppa^ C suppi^, since 

This will allow us to prove orthogonality statements about the a s later in 
the proof. For example, From this we can conclude that 

sen sen 

because the fact stated above about the Fourier support of the functions 
a s allows us to prove this inequality in the same way we prove the Bessel 
inequality in Section 12: expand the square, and notice that (a s ,at) = 
unless |s| = \t\. 

Again we remark that if Tp is nonempty, then tt\{P) C Ctti (topT) . 
Hence in the summation below we can ignore dependence on the parameter 
I used in the last section. Given these claims, together with Claim 30, we 
control the first term in (10.1) by 

E I L+ p s E / e * E 

P&V Pev Jp se(T+) 1 

+ E / £s E (f^s){(/)s - a s )l E 

s EE*/ p E^'^X p iE(/.^i 

k p&v k ^ j>o 1 sen 

+ E E 2- 200 V| J Pnsupp(L+)|. 

k PeV k 

Note that the second term in the last display is controlled by Claim 30. For 
P G Vk, it is convenient to split the function XlseTi (/> m to two pieces, 

using the identity 1 R2 = lD fc _ 5 + l(D fc _ 5 ) c ! where 

^fc = {(x,y): |y| <2 fe |vr 2 (top(T))|}. 

In other words, Dfc is horizontal strip of width ~ 2 fc |7r 2 (top(T))|. (Obvious 
modifications can be made in the case k < 5.) For the first piece- the one 
closer to top(T)- we can use the fact that the tile P is far from top(T) 
together with the decay in j to obtain good control. For the second piece- 
the one away from top(T) - we can take advantage of the decay in the wave 
packets associated to tiles in T in the form of the Bessel inequality in Lemma 
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36. We focus first on the term close to top(T): 



EE*/ p E^^/, p iE</^k^- 
= EE*/ p E^^X P iE(/^>.^- 

k PdV k ^ j>k 1 seTi 

= «/" 2- Nk M(\J2(f^s)a s l Dk _ 5 \) 
Ju l=0 u PeVkl P seTi 



2 

fc-B I 



This nearly finishes the proof for the first term, since we may estimate this 
L 2 norm by using orthogonality in the x-variable just as in the proof of 
Lemma 36 below. (Readers uncomfortable with this should look to the 
proof of Lemma 36.) Specifically, we have 



\^2(f, l Ps}a>slD k _ 5 \ 2 = E E^^)^^') / astts 

< Ei^>/)i 2 E / i<wi 

seTi s':\s\=\s'\ 



< Ei^-/)i 

< -2 1 



2 



^|top(T)|. 

We have used symmetry and the x-orthogonality in the first inequality above. 
This finishes the proof for the first term. To control the second term (the one 
away from top(T)), we can appeal directly to a Bessel-type inequality. Here 
we use such an inequality for the functions a s rather than the functions (p s , 
just as in the estimate above, but we also obtain significant decay in k just 
as in Lemma 36. The proof is identical to the proof of Lemma 36. Hence 

EE E */ p E^^X p iE(/.^)«.wi 



k i=oPeV k i " r j>o 1 seTi 



< $ [ M(\J2(f^s)a s l {Dk _ 5) c\) 

Jug u PePkl P seTi 

< s\ug u PeVkl P\^ f |^(/,^> s l (Dfc _ 5) c |2 
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< 52 k | top (T) 1 3 (a 2 2" 100fc | top (T) | ) 2 

< 2- 10fe <5a|top(T)|, 
which is what we want. 

Proof of Claim 31. Recall that we are considering a point x £ P for some 
parallelogram P, and we consider the sum 

(f,(p s )4> s (x). 

sen-. |7Tl(s)|>|7ri(P)| 

The restriction in the summation already implies that for any x, there is 
m(x) such that all tiles s who make an appearance in the sum above satisfy 
|7Ti (s) | > m(x). Further, since we know that u{x) G oj s ,2, we also have 
M(x) such that all tiles s who make an appearance in the sum above satisfy 
Ki(s)| < M(x). Both of these claims are reversible, so 

{s G Ti: |vri(s)| > Ki(P)|} = {s G T: m(x) < L(s) < M(x)}. 

Hence it is our goal to estimate 

(f,<p s )a s . 

sGT: m(x)<L(s)<M(x) 

Denote by A; a Schwartz function such that supp&i C [—1 — 1 + j^j] 2 , 
and such that = 1 for £ G [—1, l] 2 . further denote by fc r the function 
obtained by adapting k to the rectangle x -M; i.e., let k r (x,y) = 

Mf> ™)- With this definition, we know for any iV (which appears in the last 
line of the computation below) 

^2 (f,fs)a s = ^ (f,fs)a s 

sGTi : m{x)<L(s)<M{x) s£Ti : m{x)<L(s) 

s£Ti : L{s)>M(x) 

= C^2{f^s)a s )*k m{x) 
seTi 

- (J2(f,fs)a s ) *k M(x) 
seTi 

£ E^'i^i £</.*>«.!■ 

□ 

Proof of Claim 32. By the argument at the beginning of the proof of Claim 
31, it suffices to estimate 

Y (f,fs)(<Ps{x) - a a (x))l w<i2 (u(x)). 

s<=T: m(x)<\iri(s)\<M(x) 
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To do this we first estimate \(f> s — a s \. By definition, we have 

\(f) s (x) - a s (x)\ < J \ip s (i)\\ip s (xi - t,x 2 - tu(x)) - <p(xi - t,x 2 )\dt. 

To compute the difference in the integrand, estimating the following quantity 
will be helpful: 

★ := sup - — ip s (xi - t,x 2 - z). 
ze[o,tu(x)] ax i 

Fix an integer j > 1 and consider \t\ ~ 2 J '|7Ti(s)|. If (x±,x 2 ) 2- ?+10 s, 
then * < Xs\x\,X2). If (x\,x 2 ) G 2 J,+10 s, then * < 1. We also have that 
^s(i) ^5 2 Afj| s | for any N. Analogous facts hold when j = and |i| < |vri(s)|. 
Let Ij = {t: \t\ ~ 2- ? |7Ti(s)|} for j > 1 and Iq = {t: \t\ < |7Ti(s)|}. Combining 
these observations gives us for (ari,ar 2 ) 2^' +10 s that 

|0 s (*)-a s (z)| < E/ F Ju2^|vr 1 (,)|^M X ( 2 )(x 1 ,x 2 )dt 



< | 7 r 1 ( S )|^ x (2) (x] ., 2) 



If (zi,X2) S 2J'+ 10 s, then we have * < 2 100 ^ 2) , so 

|^(x)-a.(x)| < E/ oZj^-r^kiWI— * 

It. 2 iyj \s\ w 



w 

Since u(x) 6 w S) 2 for all s € Ti, we know u(x) < r^hn- Combining this 

with the fact that |(/, ip a )\ < cr|s| 2 and the estimate immediately above, we 
have 

I £ -a a )| < £ *M*M*)l^^x£ 2) (zi,Z2) 

m(:r)<|iri( a )|<M(s) Ki(s)|<^ 

< CT Xtop(T)( Xl ' X2 )' 



which is what we claimed. 

□ 



11. Proof of size estimate 

In this section we write f = If; note that we do not use the fact that / is 
a characteristic function. As with the tree lemma, there are small modifica- 
tions required from the one-dimensional situation to handle Schwartz tails 
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in the vertical direction. We use the Bessel inequality from Lemma 36 to do 
this. First we note that by assumption, 

^itopcoi < £Ek/'^>i 2 

TeT TeT seT 



= //EE</>^ 

J T s 



T s 



It is enough to prove 



5252(f,<p s )<P,\\2<<T /^|top(r)|. 



T s 



TeT 



By expanding the square and using symmetry, we have 

II E E^'^^Hs = E E E E (fi { Ps){f,Vs'){<Ps,<J>s') 
TeT seT TeTT'eT seT' s'eT' 

~ EEE E \(f^s)(f,^s')(fs,^ S ')\ 

TeT seTT'eTs'eT'-. \s'\=\s\ 



+ EEE E {f^s){f,<ps'){<ps,<p S ') 

TeT seTT'eT s'eT'-. \s'\<\s\ 
= B + C. 

{s : \s'\ = \s\ and ui s D oj s > 7^ 0} 



Note that 



partitions R , so 



E K 1 ^, W>| ~ L 
I«'I=M 

Hence we can estimate the first term, using symmetry again, by 

s < EEE E \(f^s)\ 2 \(<Ps,<p s >)\ 

TeT seTT'eTs'eT-. \s'\=\s\ 

£ EEi(/'^)i 2 

TeT seT 

~ CT 2 ^|top(T)|. 

TeT 

Now we look at the second term C. By Cauchy-Schwarz, we have 



c < E Ewl E 



TgT VsGT 



v seT 



E E (<Ps,<p S ')(f,<ps 

T'eV s'eT': \s'\<\s\ 
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< J>|top(T)|5£>(T)5 

TeT 

where 

2 

D ( T ) = H E E ¥».')(/. v-') • 

seT T'eT' s'eT': |s'|<|s| 

It remains to analyze D(T) for a tree T £ T ■ We claim that the set of 
tiles over which the inner sum ranges is actually independent of s. More 
specifically, define 

A = {s' £ (J T' : w Sj i flWs',1 / and \s'\ < \s\ for some s £ T}. 
T'^T,T'eT 

Then 

Claim 33. For each s£T, 

E E (<Ps,<Ps'}(f,<Ps>) = ^2(<Ps,<Ps>){f,<Ps>}- 

T'eTs'eT'-. \s'\<\s\ s'eA 

Proof. It is clear from the definition of A that the summation on the left is 
over a set of tiles that is contained in A. So suppose s' £ A; by definition 
of A, this gives us s £ T such that \s'\ < \s\ and such that u^i D uv,i / 0. 
This last condition guarantees that (jj s > \ 5 wr- If \s\ > then of course 
|s| > and oj St i n uv l ^ 0, so that in fact the tile s' appears in the 
summation on the left hand side of the claim. If |s| < |s| and \s\ > \s'\ then 
we are done as before. So assume |s| < |s'| < In this case w^iflWj^i = 0, 
which implies that (ip s , ip s i) = 0, finishing the proof of the claim. □ 

Now for a collection of tiles C, define 

F(C) = Y,(f><Pt)<Pt- 
tec 

With this notation, we have 

D(T) = ^\(^,F(A))\ 2 . 

Before we proceed, we mention a key disjointness property of tiles in A. 
Claim 34. Tiles in A are pairwise disjoint. 

Proof. Suppose t, t' £ A. Then there are s,s' £ T such that ut t 2 5 w s 3 
Wtop(T) an d such that cof^ 5 c*V 2 ^top(T')- Hence cot,2 H uit',2 we mav 
assume without loss of generality that u>t,2 Q ^t',2, i-e., that \t'\ < \t\. This 
means the tree T* containing t was selected before the tree containing t'. 
Finally, note that t and t' cannot belong to the same 1-tree, since 0Jt,2 Q w#,2- 
If t n t' ^ 0, then in fact t' C V(top(T*)), and hence t' was included in the 
maximal tree T* containing the 1-tree T*; see the selection algorithm in 
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Section 4 for construction of this tree T* . Hence the tiles in A are pairwise 
disjoint. □ 

We now introduce some more notation to sort the tiles in A according 
to how far they are from top(T). For k > 1, let Rk = 2 fc top(T). Let 
R = top(T). Then let 

A k = {s' € A: s' C R k but s' £ R k -i}- 
Now by Minkowski, 

(j2\(^,F(A))A 2 < £ (j2\(<Ps,F(A k ))A 2 - 
VsGT / k \seT / 

It remains to show 

^|(^,F(A))| 2 <2- 10fc a|top(T)|. (11.1) 

We will use the spatial localization of the tiles s G T to top(T) to obtain 
the desired decay in k. We have 

J2\(^,F(A k ))\ 2 < ^l^^l^F^P + El^'^-a^))! 2 
seT seT seT 

= h + Ih- 

First we estimate Ik- For x G -Rfc-3 and s G i^, we have 
Vs{x)l Rk _ 3 {x) <2- wk ^= x t\x). 



We now estimate 1 1 l_R fe _ 3 -^(-4,^) 1 1 2 by duality. We make one small observation 
preliminary: 

Claim 35. // M is the strong maximal operator, then 

J X { r\x)g{x)dx< J ' Mg(x)dx. 

We remark that each s G A is essentially pointed in the direction of T, so 
the strong maximal operator is appropriate here. 

Proof. 

f X ^\x)g(x)dx < | s |£ 2 - 3fc -^ f \ g \ 



< 



s| inf Mg{x) 



< 

Is 



J^Mg(x)dx. 



□ 
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Consider a function g € L 2 , and remember that \(f,(p s )\ < cr^/jsj. Then 
using the claim above about disjointness of tiles s 6 A k , we have 

J F(A k )gl Rk _ 3 = j </> 



< 


./ 


' ^ 2- 10 V X (»)(x) fl 








< 


2" 


■ io v Wm 5 








< 


2" 










< 


2" 




< 


2" 


- 10 V(2 2fe |top(T)|)i|M| 2 



which implies that 

I k <\\l Rk _ 3 F(A k )\\ 2 2 <a 2 2-* k \top(T)\. 

This proves (11.1) for I k . 

To estimate Ilk, we need only estimate ||F(.Afe)||2 and apply Lemma 36. 
We do this just as above: let g be such that ||g||2 = 1. Then 



J F(A k )g < J | ^2{f,<Ps)Vs9 



< 



< 



<j\R k \*. 



So 



11^(^)111 <^ 2 |U^l fe | <cr 2 2 2fc |top(T)|. 
Hence by Lemma 36, 

II k < 2- 10k \\F(A k )\\ 2 , < a 2 2" 8fc |top(T)|. 
Summing in k proves 

D ( T ) % o- 2 |*op(T)|, which finishes the proof. 



12. Localized Bessel inequality 



In this section we prove a Bessel inequality for 1-trees with functions 
supported away from the top of the tree. Specifically: 
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Lemma 36. Let T be a 1-tree. For k>l, let R k = 2 fe top(T). For k > 1, 
let Qk = Rk \ Rk-i- Define Qq = top(T). Then for any N > 0, 

^|(/ln fc ,^)| 2 <2-^||/l n J|I. 

Remark 37. For a classical 1- dimensional tree, this can be proved by using 
the extreme spatial decay of the wave packets (p s , s £ T, away from top(T). 
We use this in conjunction with orthogonality in the x-variable to handle 
interactions of functions <p s , ip s > horizontally close to the tree, where tail 
estimates do not improve for shorter tiles in the tree. This is the reason for 
the decomposition of ttk into Bk and Ck in the proof below. 

Proof. For notational convenience, we will assume that the parallelogram 
top(T) is centered at the origin, has width 1, and has sides parallel to the 
coordinate axes. First note that 



/^|(/ln fc ,^)| 2 = f£\(flB h ,<Ps)\\f£\(flc h ,<Ps 
y seT y seT y seT 

=: B + C, 

where 

B k = {(x,y) £ f2fc : \y\ > 2 k } 
Ck = ^k\Bk- 

To estimate B we will need to use orthogonality in the horizontal variable. 
To estimate C we will need only spatial decay, as in the one-dimensional 
case. 

Note that by Cauchy-Schwarz 



B 2 = [ /^(/l Bfc ,^)^ 

< IIAnJh VV/ / (flB k ,<Ps){flB k ,Vs')Vs(x,y)<P s >{x,y)dxdy 
\. C T„ C T/-'l2/l>2 fc ■'sent , 



\seTseT' J \y 

Also note that if |s| / then for every y, we have 



L 



<p s (x,y)<p s '(x,y) = 0. 



This follows from the definition of the wave packets (p s ; specifically, note that 
7Ti(supp((^ s )) n 7Ti(supp(<^ s /)) = whenever u; S) i D w Sj 2 = 0, which happens 
whenever s,s' are in the same 1-tree and |s| ^ \s'\. By symmetry we may 
estimate |(/lfi fc ,¥> s )(/ln fc ,¥V)| < |(/lfi fc , Vs)?, which gives us 

^2^2 {f 1 B k ,<Ps}{fln k ,tPs>)Vs(x,y)<Ps>{x,y) 
seTs'eT J \y\^ 2k Jx 
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< E E \(f 1 B k fs)\ 2 [ [ \<Ps\M- 

seTs'eT:\s\ = \s'\ J\y\>2*Jx 



But note that 

E / [\<Ps\\<Ps'\<2- Nk , 

s'eT: \ sHa >\ J \v\>* Jx 



because the prototype <p is Schwartz, s £ T, and is far away from top(T). 
Hence 



B< 2-T*||/i n J| 2 [J2\(nn k <Ps 



\ser / 

We now estimate C. Define 

Ti ={seT: \s\ = 2^'|top(T)|}. 

Note that if s £ T* , then |(/l Cfc ,^)| < 2"f ^^'H/lfJh by Cauchy- 
Schwarz and the fact that ||<£> S 1 C J|2 ^ 2~~ k ~^ . This last claim follows 
from the fact that (p s is highly localized to top(T), and because is far 
away from top(T) horizontally. (Of course we could not make the same 
argument for B because we can do no better than H^lsJ^ ^ 2~ Nk for 
s ET J ; i.e., there is no decay in the parameter j.) This is already enough: 

c <EEi^^)i 2 ^ 2 " ffc n^ii2' 

which finishes the proof of the lemma. □ 



13. Square function estimates 

In this section we prove Lemma 28. The proof is similar to the standard 
proof of LP boundedness for the analogous one-dimensional square function, 
with a few tweaks to handle the two-dimensionality. For notational conve- 
nience we will assume, without loss of generality, that the tree T has top 
that is axis parallel and centered at the origin. Proving the lemma with 
the spatial localization requires us to decompose A spatially as follows. For 
k > 1, define the set Q k = 2 fc top(T) \ 2 fc ^ 1 top(T). For k = 0, define 
n k = top(T). Now define 

a*(/)= (Eiai^)i 2 ^ 

\seT |s| 
By Minkowski's inequality, we have 

a/(x) = fel^E 1 "^-)! 2 ^ 

Wt k 1 1 
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k 

pointwise, so again by Minkowski's inequality we have 

||A/|| p <£||A fe /|| p . 

k 

We will prove that for any TV, 

-Nk 



||A fc /|| p <2^ fe ||l^/|| p . (13.1) 
With this, we can use Holder's inequality to see that for any N, we have 
l|A/|| P < 



- ' ' \PN,Tf\ P ' " 



where 0n,t is the function defined in the statement of Lemma 28, which 
finishes the proof of Lemma 28. It remains to prove (13.1). Note that 
Lemma 36 is exactly this when p = 2. By interpolation, it is enough to 
prove the following weak type estimate: 



|{A fc />A}| <2 2fc ^i. 

By dividing the function / into < 2 2k pieces, we may assume the support of 
/ is contained in a translate of top(T). With this assumption, it is enough 
to prove for such / that 



|{A fc />A}|< v , 

Our argument proceeds more or less by the usual path of Calderon-Zygmund 
decomposition. 

Denote by Rk the rectangle with same center and length as R but 2 k 
times the height. Let B be the collection of maximal rectangles of width w 
taken from the collection such that 



/ l/l > 2 5fc A, 

JRu 



\ R k\ jR k 

and for each R € B , let R' = m(R) x 7r 2 (Ctop(T)). Then let B = {R' : R G 

B}. We can see already that J2 Re g\ R \ ^ Er g bI-R| ^ This follows 

from the weak (1,1) inequality for the Hardy-Littlewood maximal function, 
which holds for rectangles of fixed width: if we write, for k > 0, 
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then we have 



£1*1 

RGB 



£ 2 * 

fc>0 

1 



< 

A 



For each (x, y) G R, let 

b(x, y) = f(x, y) - - — ]— - I f(z, y)dz. 

Note that by definition we have for each y G 7T2(top(T)) that 

/ b(x, y)dx = 0. 
Ai(il) 

We also have the following helpful fact: 

Claim 38. For each y G 7T2(Ctop(T)) ; u>e have 

! f(z,y)dz<CX. 

Proo/ o/ Claim. Note that / is supported in the annulus of width ^. Let k 
be a function such that = 1 for £ G [— 4w/,4u/]. Then 



f(x,y) = J f(x,w)k(y - w)dw, 



so 

1 



, |/(*M/)|d« = i — 7^x7/ I f(z,w)k(y-w)dw\dz 

Because rapidly decays away from a rectangle of height w, if we denote by 
i?^ the rectangle with same center and length as R but 2 k times the height, 
then 

/ | / f(z,w)k(y — w)dw\dz 

< ' 

Fi 

< A, 

where the last inequality is by assumption on R. □ 

With this claim, we define 

9(x,y) = f(x,y) for (x,y) [J R 

ReB 



jmf T fM*- iah dwdz 

( R )\ J MR) V 2 ^ 
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and 

g(x,y) = ] — Lr / f(z,y)dz for {x,y) e Re B. 

Note that by the claim we have g(x, y) < A for (2, y) G i?. Further, for 
almost every (x,y) Ufles ^ such that g(x,y) = f(x,y) >> A, there exists 
a horizontal line segment L through (x,y) such that jj^ f L f » A, which 
implies there is a rectangle of width w containing (x, y) on which the average 
of / is larger than A, contradicting our assumption that (x,y) Ui?eB-^- 
Hence g < A almost everywhere. 

To see the purpose of including the rectangles 5CR' in the exceptional set 
(rather than a small dilate of R itself), consider a rectangle R north of the 
tree T, and a mean zero function h supported on R. Analysis of J/ 5 q R \ c Ah 
is a bit more complicated than in the one-dimensional case because the 
collection {(p s } se T has no orthogonality in the vertical direction. However 
by excluding R', we need only consider small tiles s supported away from 
the vertical translate of 5CR, allowing us to take advantage of the spatial 
decay (in the horizontal variable) of the functions (p s . 

With this modification, the proof now proceeds as expected: Use the fact 
that \g\ < A, together with the L 2 estimate on A to see 

l/A n^\\\< ^ 5|2 < 11/111 

Additionally, by the Chebyshev and triangle inequalities, together with sub- 
linearity of Afc, we have 

|{x^:A fc £6 fl )>A}| < 1W \A k (b R )\. 
To finish the proof we show that for each R 6 £>, we have 



/ 

J(5 



'(5C*i?') c 

which will give us that 



\A k (b R )\ < J \b R \, (13.2) 



\{x?E:AC£b R )>\}\<\Yl J 1**1 £E MS 



A 



Once again, to prove (13.2), we essentially follow the one-dimensional 
argument, dealing with a few extra nuisances along the way. A reader 
having trouble seeing through the technicalities should note that all of the 
computations below are essentially the same as in the one-dimensional case. 
The problem is understanding why the present situation is essentially the 
same as the one-dimensional case. More specifically, to prove (13.2), it is 
convenient to make a few simplifying (and valid) assumptions. For each 
parallelogram s£T define 

a = 7ri(a) x C7r 2 (top(T)). 
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Since s C s, it is clear that if we define 

A fc /= ^|(/ln fc ,^)| 2 ^ 2 , (13.3) 

then Afc/ < A^/ pointwise. For each s G T, we know that 7Ti(5) is contained 
in the union of two dyadic intervals and s# each of size < 7Ti(S). Further, 
because the set of tiles of a given size and orientation partition R 2 (i.e., for 
each w £ D, we have Ufietc ^ = ^ 2 > see ^ e definitions in Section 2), and 
because |tti(s)| > |7T2(s)| we know that for any dyadic interval /, there are 
< 1 tiles s G T such that / = iti(s~l) or / = tti(s~ r ). All of this allows 
us to assume (possibly after dividing T into ~ 1 pieces) that the tiles s are 
parameterized by dyadic intervals, and that for each x G Ctop(T), and each 
dyadic interval I, there is at most one seT such that x G s and 7Ti(§) = /. 

To prove (13.2), we split the sum inside Af into two pieces, one over tiles 
whose vertical projection is smaller than the length of R, and the other over 
tiles whose vertical projection is larger than the length of R. We begin by 
controlling the sum over smaller tiles. Note that the dominant term in both 
cases comes from tiles such that |7Ti(s)| ~ |7Ti(-R)|. In the integral below, we 
need only consider x G Ctop(T) such that iri(x) G" ni(5CR). This allows 
us to prove the desired estimate using spatial decay alone. Further, since 
Is (a;) is constant on vertical segments projecting to 7T2(Ctop(T)), we have 

E \«>*><p.)\ 2 h 

x eKto P (T)n(5CR>r \ l7Tl{s)l < lni{R)l 



< 

> X 



L 



b R \\\ f\x-c(R) 



<=Ktop(T)n(5CR>) c \ \ R \ 2 V kiC-R)! 
- I 

)l JteNL: 




< JMl/" 1 dt 

< INIi- 



We emphasize that the integral in the second-to-last line is one-dimensional. 
It remains to control the sum over the tiles with vertical projection larger 
than \tt\ (R) \ . This requires using the mean-zero-along-horizontal- line-segments 
property of the function b R . Note that for any smooth function h, we have 



(b R ,h) = I / b R (x,y)h(x,y)dxdy 

JyeiT2(R) Jxeiri(R) 

< / \bR{x,y)\\h(x,y) - h{c^ R) ,y)\dxdy. 

Jv€m(R) JxPin (R) 



ly^2(R) Jxe-n-i(R) 

Our goal is to apply this to the wave packets <p a . Specifically, we will show 
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Claim 39. 

I/. \ I < I,, || 1 Mg)l - (, f \x-c(R)\ Y W \ 

Proof. We must deal with a small technicality here: the tiles s need not be 
precisely axis parallel, but fortunately they are close. Precisely, we have that 
the vertical component (when using the coordinate frame of s) of (x, y) — 
( c tti(R)> y) is l ess than • Of course we have the horizontal component 

(when using the coordinate frame of s) of (x, y) — (cr, y) is less than |7Ti(i?)|. 
Further, we know that 

D 1 (p s (x,y) < —7=-, — r^T^-vC 



7Tl («) | dx |vTi(s)| ' 

1 1 d x y 
D 2 <p s {x,y) < —=-—<p{- — pr,,— ). 



Hence 



<p s (x,y) - ^ s (c(7Tl(i?)),y) < -— -=-—<£{- prr,— , 



+ fi(«) — 7r=f 1 — rriF^i — TX\^-- 

\K\[S)\ ox \k\[s)\ w 



□ 



The claim yields, writing T = iftop(T) n (5Ci?') c , 



Ani(s)\>\ni(R)\ 



,l 5 (s) 



< 



< 



^||6fl||iki(i2)| 



|7ri( S )|>|7ri(fi)| 



V 



mm 



A / |x-c(fl)| \- 10 \ 

^Ht^tJ ) 

\lTl{s)\\s\^ 



y V 

1.5 (X 



dx 



> J 



\\b R \\i. 



This completes the proof of (13.2) and thus the proof of Lemma 28. 



14. BMO TYPE ESTIMATES FOR THE SQUARE FUNCTION 

In this section we prove Lemma 29. As in the previous section, we consider 
the related operator A. See (13.3) for the definition, as well as the discussion 
immediately following the definition for several simplifying assumptions that 
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we make. To prove the Lemma, we prove the following key claim. Here, and 
in the rest of the proof, we write a = size(T); note that we also have 



<j ~ 



sgT / 

As in the last section, we consider a slightly modified version of A: define 



2 *-J_ 

s\ 



\ser 

where the rectangles s are defined immediately above (13.3). 
Claim 40. 

|{A/>an}|<2^ 2 |{A/>a}|. 

( Of course we do not need the full exponential- squared decay, but we do have 
it.) 

With the Claim, we are almost done: 

„ oo 

HA/HI < /. (A/) 2 + £J>n) 2 |{A/>na}| 

J{Af<a} n^Y 

oo 

< /_ (A/) 2 + ^^(an) 2 |2- 2 {A/>a}| 

J{Af<a} V 7^1 

< I (Aff + a 2 \{Af > a}\ 

J{Af<a} 



< a / Af + a / Af 

J{Af<a} J{Af>a} 

With this, we see that 

a 2 |top(T)|~||A/|| 2 <a / A/, 

which proves that 



i 



||A/|| 2 ~ v\top(T)\* < — - / Af, 

|top(T)|2 J 

which is what we need. It remains to prove the claim. 

Proof of Claim 40 ■ Of course to prove the claim it is enough to show that 

|{A/> > /na}|<2-»|{A/>a}| > 
and this is equivalent to showing 

|{(A/) 2 >n CT 2 }|<2-"|{(A/) 2 >a 2 }|, 
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which can be shown in a rather straightforward manner following the proof 
of the John-Nirenberg inequality. Recall that for each dyadic I we have an 
associated tile in T, which we call s(I). For notational convenience, define 
for intervals I, K 

ai,K(x)= £ K/>^)>| 2 U#- 
We first note that for any K, if / is a maximal interval on which 

then we know 



ai K > ma 2 , 



ai,K < (m + 2)a , 

since 

l(/,^,)i 2 ^<^. 

We begin by defining a collection of intervals Z$: 

1 = { maximal dyadic I: a />7ri ( Ctop ( T )) > 100a 2 }. 
Then having defined Z„_i, define for any K € I n -i 

T n (K) = { maximal dyadic /: qj^k > 100a 2 } 

l n = |J X n {K). 

We remark that for any K, 

U m<\m. 

To see this we only need to use Chebyshev and the estimate on size(T): 



U ' 

ieln(K) 



JCK 



~ 10 1 '' 

where the last inequality is due to the estimate on size(T). Similarly, 
| (J I\ < ±|,n(Ctop(r))|. 

Putting together all K in X n _i gives us that 

U m £ \ U n 

ieXn IeZn-l 
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and iterating this gives us that 

U i j i< 2_ " U 

which proves Claim 40 since 

(Af) 2 (x) <na 2 
for x such that iri(x) U/ex„ ^- 

15. Appendix: The case p > 2 



□ 



In this appendix we briefly discuss the proof of Theorem 1 for p > 2, 
which is essentially the proof in [4]. 

Following the tree decomposition of Section 4 and the remarks in Section 
5, we need to show 

EE E ^ito P (T)i < ifhei 1 -^. 

This time we care most about p close to oo. We may assume \E\ < \F\ 
because if \E\ > \F\ then we may apply the previous arguments for the case 
p < 2. We emphasize here that there is no circularity. Both the argument 
in this section (in which we assume \E\ < \F\ ) and the argument in the 
bulk of the paper (in which we assume \E\ > \F\) work when p = 2. Hence 
the p = 2 case of the estimate in (4) is established for arbitrary E, F. This 
allows us to assume \E\ < \F\ in this section, where p > 2, and allows us to 
assume \E\ > \F\ in the earlier part of the paper, where p < 2. 
By Estimates 11 and 12 it suffices to prove 

E ^min(^,M)<| F |^|i-| (15.1) 

for p > 2. 

The following simple estimate will be helpful: 
Claim 41. For any 5, we have 

^Samin^^)<^EW\- 

a 

Proof. We need only observe that the two terms in the minimum are equal 
when a = y^j^j an d split the sum over a accordingly. □ 

We split the sum (15.1) in 5 into two pieces, with the dividing line being 

\E\ 

S = For smaller 5, we use Claim 41 above: 

E E^ min (x'5 } ~ £ 
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< \E\. 

For larger 8, we use the estimate size < 1: 

Claim 42. If the function in the definition of size(T) is called f , then 

size(T) < H/IU. 

Of course we are using f = If, which proves that here size(T) < 1. 
Proof. For k > 1, define 

n = top(T) 

n k = 2 fc top(T) \ 2 fc - 1 top(T). 
We need only note that for any 1-tree T, by Lemma 36, 




< ll/HL|top(T)| 

since < 2 2fc |top(T)|. This proves the claim. □ 
Hence 

£J>iS<| £ |io g m. 

- 1^ 

Combining these two estimates proves (15.1) since \E\ < \F\. 
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